Search results for "Web document"

showing 4 items of 4 documents

Readability and the Web

2012

Readability indices measure how easy or difficult it is to read and comprehend a text. In this paper we look at the relation between readability indices and web documents from two different perspectives. On the one hand we analyse how to reliably measure the readability of web documents by applying content extraction techniques and incorporating a bias correction. On the other hand we investigate how web based corpus statistics can be used to measure readability in a novel and language independent way.

060201 languages & linguisticsMeasure (data warehouse)Information retrievalcontent extractionlcsh:T58.5-58.64Relation (database)lcsh:Information technologyComputer Networks and CommunicationsComputer sciencebusiness.industryweb document readability; content extraction; corpus statistics06 humanities and the arts02 engineering and technologycorpus statisticsReadabilityWorld Wide Webweb document readability0602 languages and literatureContent extractionComputingMethodologies_DOCUMENTANDTEXTPROCESSING0202 electrical engineering electronic engineering information engineeringWeb application020201 artificial intelligence & image processingBias correctionbusinessFuture Internet

researchProduct

Graphical information models as interfaces for Web document repositories

2000

In interorganisational processes, documents are used to record information created during the processes. Legislative processes involving several legislative organisations, or manufacturing processes involving complicated networks of companies and officials are examples of such processes. In the contemporary computerised environments a great deal of the recorded information is scattered in different kinds of Web repositories with different kinds of interfaces. The repositories should serve as valuable knowledge assets but their use may be difficult and even the knowledge about the kinds of repositories available may be insufficient. The paper presents a method for improving information manag…

Information managementbusiness.industrycomputer.internet_protocolComputer scienceWorld Wide WebMetadataInformation modelEuropean commissionGraphical modelTelematicsbusinessWeb documentcomputerXMLProceedings of the working conference on Advanced visual interfaces

researchProduct

Combining content extraction heuristics

2008

The main text content of an HTML document on the WWW is typically surrounded by additional contents, such as navigation menus, advertisements, link lists or design elements. Content Extraction (CE) is the task to identify and extract the main content. Ongoing research has spawned several CE heuristics of different quality. However, so far only the Crunch framework combines several heuristics to improve its overall CE performance. Since Crunch, though, many new algorithms have been formulated. The CombinE system is designed to test, evaluate and optimise combinations of CE heuristics. Its aim is to develop CE systems which yield better and more reliable extracts of the main content of a web …

Information retrievalComputer sciencemedia_common.quotation_subjectDesign elements and principlescomputer.software_genreCrunchTask (project management)Content extractionQuality (business)Data miningHeuristicsWeb documentcomputermedia_commonProceedings of the 10th International Conference on Information Integration and Web-based Applications & Services

researchProduct

Experimental BIM applications in Archaeology: a work-flow

2014

In the last few decades various conceptual models, methods and techniques have been studied to allow 3D digital access to Cultural Heritage (CH). Among these is BIM (Building Information Modeling): originally built up for construction projects, it has been already experimented in the CH domain, but not enough in the archaeological field. This paper illustrates a framework to create 3D archaeological models integrated with databases using BIM. The models implemented are queryable by the connection with a Relational Database Management System and sharable on the web. Parametric solid and semantic models are integrated with 3D standardized database models that are finally manageable in the pub…

business.industryComputer scienceCloud computingcomputer.software_genreArchaeologyBIM Archaeology Semantic modelling Data export Web documentationField (computer science)Domain (software engineering)Cultural heritageBuilding information modelingRelational database management systemWork flowbusinesscomputerDatabase model

researchProduct